In reviewing the heatmap for speed, it was observed that parts of the patterns are consistent. To take the analysis a step further,the task in this notebook is to obtain characteristic statistics (mean and standard deviation) for each of the metrics - Speed, Flow and Occupancy. Because there may be variability with the day of the week, each day of the week is heatmapped.
One of the directions from the last meeting with capstone advisor, it was suggested that we create mean and standard deviation plots that show by day. For each day and each variable, there should be 3 heatmaps that show Mean, Mean - STD, Mean + STD. Each day should have nine plots.
Gaining insights across all days is difficult with the format of the notebook, so all pdf's were consolidated into one file. On paging through, shifts across the days are viewable.
As expected, the traffic patterns for Saturday and Sunday were very different from the rest of the days. On weekdays, we do see different patterns throughout the week. As the week progresses, the heatmaps show broader congestion throughout the week from Monday to Friday.
in progress
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
import pandas as pd
import datetime
import time
reduce_data_by_dict This function is a universal function that takes a dict and selects dataframes based on the identified key value pair.
def reduce_data_by_dict(df, keyval_dict):
for key, val in keyval_dict.iteritems():
df = df[df[key] == val]
return df
This function loads the raw, minute data (takes a LONG time to load). From this it selects only 1 freeway and direction. It may make sense to run this only once, but the data for each freeway could get rather large. There's an opportunity to use spark RDD's to increase performance
def get_fwy_data(_fwy,_dir, overwrite=False, output_csv=False, nimportrows=-1):
# Ensure proper naming convention for file
myname = "".join([ "i",
str(_fwy),
str(_dir)])
filepath = "".join(["../data/metric_statistics/",myname,".csv"])
# Check to see if the file should be overwritten or if it is not available.
try:
if overwrite == False:
freeway = pd.read_csv(filepath, sep='\t')
except:
overwrite = True
if overwrite:
# import raw data
raw_5_min_filepath = '../../../five_min_frame.csv'
if nimportrows != -1:
raw_5_min_data = pd.read_csv(raw_5_min_filepath, nrows=nimportrows)
else:
raw_5_min_data = pd.read_csv(raw_5_min_filepath)
raw_meta_filepath = '../../../d11_traffic_data/meta/d11/d11_text_meta_2015_01_01.txt'
meta = pd.read_csv(raw_meta_filepath, sep='\t')
# Filter raw and meta datasets by freeway and direction
# Reduce raw_5_min data
keyval_dict = {"District": 11,
"Freeway #": _fwy,
"Lane Type": 'ML',
"Direction": _dir}
redux_5_min = reduce_data_by_dict(raw_5_min_data, keyval_dict)
#Reduce meta
keyval_dict = {"District": 11,
"Fwy": _fwy,
"Dir": _dir}
redux_meta = reduce_data_by_dict(meta, keyval_dict)
# Create helper columns
raw_5_min_data['time'] = pd.to_datetime(raw_5_min_data['Timestamp'], format="%m/%d/%Y %H:%M:%S")
raw_5_min_data['timeOfDay'] = raw_5_min_data['time'].apply(lambda x: x.strftime("%H:%M"))
raw_5_min_data['Weekday'] = raw_5_min_data['time'].dt.weekday
# Keep only columns used
redux_5_min = raw_5_min_data[['Total Flow', 'Avg Occupancy', 'Avg Speed', 'Weekday', 'Station', 'timeOfDay', 'District','Freeway #', 'Direction']]
redux_meta = redux_meta[['ID', 'Fwy','District', 'Dir','Abs_PM']]
# Get the Station's Absolute marker by merging meta
freeway = redux_5_min.merge(redux_meta,
left_on=['Station', 'District','Freeway #', 'Direction'],
right_on=['ID', 'District', 'Fwy', 'Dir'])
# Export for later use
if output_csv:
freeway.to_csv(filepath)
return freeway
i5S = get_fwy_data(5,"S", overwrite=False)
i5S
| Total Flow | Avg Occupancy | Avg Speed | Weekday | Station | timeOfDay | District | Freeway # | Direction | ID | Fwy | Dir | Abs_PM | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 9 | NaN | NaN | 3 | 1100348 | 00:00 | 11 | 5 | S | 1100348 | 5 | S | 18.822 |
| 1 | 6 | NaN | NaN | 3 | 1100348 | 00:05 | 11 | 5 | S | 1100348 | 5 | S | 18.822 |
| 2 | 30 | NaN | NaN | 3 | 1100348 | 00:10 | 11 | 5 | S | 1100348 | 5 | S | 18.822 |
| 3 | NaN | NaN | NaN | 3 | 1100348 | 00:15 | 11 | 5 | S | 1100348 | 5 | S | 18.822 |
| 4 | 6 | NaN | NaN | 3 | 1100348 | 00:20 | 11 | 5 | S | 1100348 | 5 | S | 18.822 |
| 5 | 12 | NaN | NaN | 3 | 1100348 | 00:25 | 11 | 5 | S | 1100348 | 5 | S | 18.822 |
| 6 | 11 | NaN | NaN | 3 | 1100348 | 00:30 | 11 | 5 | S | 1100348 | 5 | S | 18.822 |
| 7 | 11 | NaN | NaN | 3 | 1100348 | 00:35 | 11 | 5 | S | 1100348 | 5 | S | 18.822 |
| 8 | 11 | NaN | NaN | 3 | 1100348 | 00:40 | 11 | 5 | S | 1100348 | 5 | S | 18.822 |
| 9 | 13 | NaN | NaN | 3 | 1100348 | 00:45 | 11 | 5 | S | 1100348 | 5 | S | 18.822 |
| 10 | 16 | NaN | NaN | 3 | 1100348 | 00:50 | 11 | 5 | S | 1100348 | 5 | S | 18.822 |
| 11 | 10 | NaN | NaN | 3 | 1100348 | 00:55 | 11 | 5 | S | 1100348 | 5 | S | 18.822 |
| 12 | NaN | NaN | NaN | 3 | 1100348 | 01:00 | 11 | 5 | S | 1100348 | 5 | S | 18.822 |
| 13 | 0 | NaN | NaN | 3 | 1100348 | 01:05 | 11 | 5 | S | 1100348 | 5 | S | 18.822 |
| 14 | 9 | NaN | NaN | 3 | 1100348 | 01:10 | 11 | 5 | S | 1100348 | 5 | S | 18.822 |
| 15 | NaN | NaN | NaN | 3 | 1100348 | 01:15 | 11 | 5 | S | 1100348 | 5 | S | 18.822 |
| 16 | NaN | NaN | NaN | 3 | 1100348 | 01:20 | 11 | 5 | S | 1100348 | 5 | S | 18.822 |
| 17 | 14 | NaN | NaN | 3 | 1100348 | 01:25 | 11 | 5 | S | 1100348 | 5 | S | 18.822 |
| 18 | 13 | NaN | NaN | 3 | 1100348 | 01:30 | 11 | 5 | S | 1100348 | 5 | S | 18.822 |
| 19 | 16 | NaN | NaN | 3 | 1100348 | 01:35 | 11 | 5 | S | 1100348 | 5 | S | 18.822 |
| 20 | 5 | NaN | NaN | 3 | 1100348 | 01:40 | 11 | 5 | S | 1100348 | 5 | S | 18.822 |
| 21 | 11 | NaN | NaN | 3 | 1100348 | 01:45 | 11 | 5 | S | 1100348 | 5 | S | 18.822 |
| 22 | 6 | NaN | NaN | 3 | 1100348 | 01:50 | 11 | 5 | S | 1100348 | 5 | S | 18.822 |
| 23 | 8 | NaN | NaN | 3 | 1100348 | 01:55 | 11 | 5 | S | 1100348 | 5 | S | 18.822 |
| 24 | 7 | NaN | NaN | 3 | 1100348 | 02:00 | 11 | 5 | S | 1100348 | 5 | S | 18.822 |
| 25 | 10 | NaN | NaN | 3 | 1100348 | 02:05 | 11 | 5 | S | 1100348 | 5 | S | 18.822 |
| 26 | 9 | NaN | NaN | 3 | 1100348 | 02:10 | 11 | 5 | S | 1100348 | 5 | S | 18.822 |
| 27 | 10 | NaN | NaN | 3 | 1100348 | 02:15 | 11 | 5 | S | 1100348 | 5 | S | 18.822 |
| 28 | 14 | NaN | NaN | 3 | 1100348 | 02:20 | 11 | 5 | S | 1100348 | 5 | S | 18.822 |
| 29 | 4 | NaN | NaN | 3 | 1100348 | 02:25 | 11 | 5 | S | 1100348 | 5 | S | 18.822 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 19051479 | 5 | 0.0106 | 64.6 | 3 | 1123134 | 21:30 | 11 | 5 | S | 1123134 | 5 | S | 5.324 |
| 19051480 | 7 | 0.0146 | 64.8 | 3 | 1123134 | 21:35 | 11 | 5 | S | 1123134 | 5 | S | 5.324 |
| 19051481 | 5 | 0.0095 | 65.1 | 3 | 1123134 | 21:40 | 11 | 5 | S | 1123134 | 5 | S | 5.324 |
| 19051482 | 12 | 0.0256 | 64.6 | 3 | 1123134 | 21:45 | 11 | 5 | S | 1123134 | 5 | S | 5.324 |
| 19051483 | 11 | 0.0236 | 64.4 | 3 | 1123134 | 21:50 | 11 | 5 | S | 1123134 | 5 | S | 5.324 |
| 19051484 | 9 | 0.0200 | 64.1 | 3 | 1123134 | 21:55 | 11 | 5 | S | 1123134 | 5 | S | 5.324 |
| 19051485 | 4 | 0.0096 | 64.2 | 3 | 1123134 | 22:00 | 11 | 5 | S | 1123134 | 5 | S | 5.324 |
| 19051486 | 10 | 0.0221 | 63.8 | 3 | 1123134 | 22:05 | 11 | 5 | S | 1123134 | 5 | S | 5.324 |
| 19051487 | 5 | 0.0099 | 64.5 | 3 | 1123134 | 22:10 | 11 | 5 | S | 1123134 | 5 | S | 5.324 |
| 19051488 | 5 | 0.0106 | 64.7 | 3 | 1123134 | 22:15 | 11 | 5 | S | 1123134 | 5 | S | 5.324 |
| 19051489 | 6 | 0.0129 | 64.7 | 3 | 1123134 | 22:20 | 11 | 5 | S | 1123134 | 5 | S | 5.324 |
| 19051490 | 8 | 0.0146 | 65.5 | 3 | 1123134 | 22:25 | 11 | 5 | S | 1123134 | 5 | S | 5.324 |
| 19051491 | 10 | 0.0228 | 64.0 | 3 | 1123134 | 22:30 | 11 | 5 | S | 1123134 | 5 | S | 5.324 |
| 19051492 | 7 | 0.0151 | 64.3 | 3 | 1123134 | 22:35 | 11 | 5 | S | 1123134 | 5 | S | 5.324 |
| 19051493 | 4 | 0.0065 | 65.1 | 3 | 1123134 | 22:40 | 11 | 5 | S | 1123134 | 5 | S | 5.324 |
| 19051494 | 7 | 0.0162 | 64.4 | 3 | 1123134 | 22:45 | 11 | 5 | S | 1123134 | 5 | S | 5.324 |
| 19051495 | 10 | 0.0204 | 64.9 | 3 | 1123134 | 22:50 | 11 | 5 | S | 1123134 | 5 | S | 5.324 |
| 19051496 | 5 | 0.0105 | 64.9 | 3 | 1123134 | 22:55 | 11 | 5 | S | 1123134 | 5 | S | 5.324 |
| 19051497 | 3 | 0.0061 | 65.0 | 3 | 1123134 | 23:00 | 11 | 5 | S | 1123134 | 5 | S | 5.324 |
| 19051498 | 8 | 0.0155 | 65.4 | 3 | 1123134 | 23:05 | 11 | 5 | S | 1123134 | 5 | S | 5.324 |
| 19051499 | 4 | 0.0090 | 65.0 | 3 | 1123134 | 23:10 | 11 | 5 | S | 1123134 | 5 | S | 5.324 |
| 19051500 | 3 | 0.0062 | 65.0 | 3 | 1123134 | 23:15 | 11 | 5 | S | 1123134 | 5 | S | 5.324 |
| 19051501 | 2 | 0.0038 | 65.1 | 3 | 1123134 | 23:20 | 11 | 5 | S | 1123134 | 5 | S | 5.324 |
| 19051502 | 3 | 0.0062 | 65.1 | 3 | 1123134 | 23:25 | 11 | 5 | S | 1123134 | 5 | S | 5.324 |
| 19051503 | 4 | 0.0094 | 64.8 | 3 | 1123134 | 23:30 | 11 | 5 | S | 1123134 | 5 | S | 5.324 |
| 19051504 | 4 | 0.0079 | 65.0 | 3 | 1123134 | 23:35 | 11 | 5 | S | 1123134 | 5 | S | 5.324 |
| 19051505 | 2 | 0.0041 | 65.0 | 3 | 1123134 | 23:40 | 11 | 5 | S | 1123134 | 5 | S | 5.324 |
| 19051506 | 2 | 0.0045 | 64.9 | 3 | 1123134 | 23:45 | 11 | 5 | S | 1123134 | 5 | S | 5.324 |
| 19051507 | 4 | 0.0082 | 65.0 | 3 | 1123134 | 23:50 | 11 | 5 | S | 1123134 | 5 | S | 5.324 |
| 19051508 | 3 | 0.0046 | 65.4 | 3 | 1123134 | 23:55 | 11 | 5 | S | 1123134 | 5 | S | 5.324 |
19051509 rows × 13 columns
Using the existing freeway, this function restricts the dataset to a specific day of the week and writes the results to a csv
def get_fwy_dataByDay(df, _daynum, overwrite=False, output_csv=False):
weekday = ["Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun"]
myname = "".join([ "i",
str(df['Freeway #'].unique()[0]),
df['Direction'].unique()[0],
"_",
weekday[_daynum]])
# Check to see if the file should be overwritten or if it is not available.
try:
if overwrite == False:
freewayByDay = pd.read_csv(filepath, sep='\t')
except:
overwrite = True
if overwrite:
keyval_dict = {"Weekday": _daynum}
freewayByDay = reduce_data_by_dict(df, keyval_dict)
if output_csv:
freewayByDay.to_csv(filepath)
return freewayByDay
This creates the mean, std, mean+std, mean-std metrics for the freeway, freeway direction and/or day.
def create_freeway_stats(df):
index_fields = ['timeOfDay', 'Abs_PM']
stats = ['mean','std']
#Reduce columns of dataset only to columns of interest
fwy_grouping = df[['Total Flow', 'Avg Occupancy', 'Avg Speed', 'timeOfDay', 'Abs_PM'] ].groupby(index_fields)
metrics = fwy_grouping.agg([np.mean, np.std])
metric_stats = ['mean+std', 'mean', 'mean-std', 'std']
new_cols = []
for a in ['Total Flow', 'Avg Occupancy', 'Avg Speed']:
for b in metric_stats:
name = "_".join([a.replace(" ", ""),b])
#print name
new_cols.append(name)
if b == 'mean+std':
metrics[name] = metrics[a]['mean']+ metrics[a]['std']
elif b == 'mean-std':
metrics[name] = metrics[a]['mean']- metrics[a]['std']
elif b == 'mean':
metrics[name] = metrics[a]['mean']
else:
metrics[name] = metrics[a]['std']
fields = new_cols+index_fields
metrics = metrics.reset_index()
metrics = metrics[fields]
return metrics
This dict establishes common thresholds for heatmaps so that when comparing across days for a specific freeway, the colormap is consistent.
threshold = {}
threshold['Total Flow'] = [20, 650]
threshold['Avg Speed'] = [20, 85]
threshold['Avg Occupancy'] = [.02, 0.45]
cmap = sns.diverging_palette(h_neg=0, h_pos=260, s=99, l=10,as_cmap=True, center='light')
This function creates the heatmaps for the mean, mean+std and mean-std statistics for a given metric.
def plot_stats(df, stat, minmax, _title):
f, axes = plt.subplots(1, 3, figsize=(15, 15), sharey=True)
f.suptitle(_title)
sns.despine(left=True)
sns.set(context="paper", font="monospace")
all_stats = stat.replace(" ", "")+'_mean'
all_stats = [all_stats+"+std", all_stats, all_stats+"-std"]
i=0
for a in all_stats:
sns.heatmap(df.pivot("Abs_PM", "timeOfDay", a),
vmin = minmax[0],
vmax = minmax[1],
ax=axes[i],
xticklabels=12,
cmap=cmap,
cbar=True,
cbar_kws = {'orientation': 'horizontal'})
axes[i].set_title(a)
i += 1
f.savefig('../images/'+_title+'.pdf', bbox_inches='tight')
return None
This function creates the heatmaps for the std across the Total Flow, Avg Occupancy and Avg Speed metrics.
def plot_std_only(df, metrics, _title):
f, axes = plt.subplots(1, 3, figsize=(15, 15), sharey=True)
f.suptitle(_title)
sns.despine(left=True)
sns.set(context="paper", font="monospace")
i=0
for a in metrics:
name= a.replace(" ","")+"_std"
sns.heatmap(df.pivot("Abs_PM", "timeOfDay", name),
ax=axes[i],
xticklabels=12,
cmap=cmap,
cbar=True,
cbar_kws = {'orientation': 'horizontal'})
axes[i].set_title(a)
i += 1
f.savefig('../images/'+_title+'.pdf', bbox_inches='tight')
return None
This produces all of the heatmaps for each day of the week.
weekday = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']
metrics = ['Total Flow', 'Avg Speed', 'Avg Occupancy']
for weekday_num in range(1,7):
i5S_temp = get_fwy_dataByDay(i5S, weekday_num, overwrite=True)
i5S_temp_stats = create_freeway_stats(i5S_temp)
for stat in metrics:
title = 'i5S_'+weekday[weekday_num]+"_"+stat.replace(" ", "")
plot_stats(i5S_temp_stats, stat, threshold[stat], title )
title = 'i5S_'+weekday[weekday_num]+"_std"
plot_std_only(i5S_temp_stats, metrics, title )
/Users/mikihardisty/anaconda/lib/python2.7/site-packages/matplotlib/pyplot.py:516: RuntimeWarning: More than 20 figures have been opened. Figures created through the pyplot interface (`matplotlib.pyplot.figure`) are retained until explicitly closed and may consume too much memory. (To control this warning, see the rcParam `figure.max_open_warning`). max_open_warning, RuntimeWarning)